Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD

نویسندگان

  • Peng Cao
  • Jinzhu Yang
  • Wei Li
  • Dazhe Zhao
  • Osmar R. Zaïane
چکیده

Classification plays a critical role in false positive reduction (FPR) in lung nodule computer aided detection (CAD). The difficulty of FPR lies in the variation of the appearances of the nodules, and the imbalance distribution between the nodule and non-nodule class. Moreover, the presence of inherent complex structures in data distribution, such as within-class imbalance and high-dimensionality are other critical factors of decreasing classification performance. To solve these challenges, we proposed a hybrid probabilistic sampling combined with diverse random subspace ensemble. Experimental results demonstrate the effectiveness of the proposed method in terms of geometric mean (G-mean) and area under the ROC curve (AUC) compared with commonly used methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid probabilistic sampling with random subspace for imbalanced data learning

Class imbalance is one of the challenging problems for machine learning in many real-world applications. Other issues, such as within-class imbalance and high dimensionality, can exacerbate the problem. We propose a method HPSDRS that combines two ideas: Hybrid Probabilistic Sampling technique ensemble with Diverse Random Subspace to address these issues. HPS improves the performance of traditi...

متن کامل

An Effective Method for Imbalanced Time Series Classification: Hybrid Sampling

Most traditional supervised classification learning algorithms are ineffective for highly imbalanced time series classification, which has received considerably less attention than imbalanced data problems in data mining and machine learning research. Bagging is one of the most effective ensemble learning methods, yet it has drawbacks on highly imbalanced data. Sampling methods are considered t...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the sa...

متن کامل

A novel ensemble method for classifying imbalanced data

The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society

دوره 38 3  شماره 

صفحات  -

تاریخ انتشار 2014